eS 9 ee eee eee 
| J | AET International Journal of Innovative 
Analyses and Emerging Technology 


| e-ISSN: 2792-4025 | http://openaccessjournals.eu | Volume: 3 Issue: 1 


Leukemia Cancer Cells Segmentation and Classification Using Machine 
Learning 


Arunkumar V 
Master of Science, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, India 


G. Visalaxi 
Assistant Professor, Department of Computer Science, Bharath Institute of Higher Education and 
Research, Chennai, Tamil Nadu, India 


Abstract: Human deaths can be attributed to leukaemia, a form of cancer. The survival rates of 
those treated for it only improve with better detection and diagnosis. Currently, pathologists examine 
microscopic images for signs of cancer or blood problems. To achieve this, we look at the images' 
texture, geometry, colour, and statistical analysis for differences. This study details a variety of feature 
extraction strategies for identifying leukaemia in micrographs of blood samples. The use of image 
analysis is crucial to this process. Here, we begin with a discussion of fundamentals in cell biology and 
proceed to demonstrate our proposed method in action. In an effort to keep our prices as low as 
possible, we are just making use of visuals. We have been using MATLAB as a cancer cell detecting 
tool. 
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Introduction 

The goal of the study is to use image processing methods to diagnose leukaemia at an earlier stage [1]. 
Blood cancer, often known as leukaemia, is characterised by an abnormal and unchecked development 
of white blood cells (leukocytes) in the bone marrow [2]. Children are disproportionately affected by 
Acute Lymphoblastic Leukemia (ALL), a subtype of leukaemia. Because of its rapid progression, 
acute leukaemia can be lethal if left untreated within a matter of months. The symptoms and indicators 
of ALL are very general, which can lead to a false diagnosis [3-7]. Manual blood cell categorization is 
time-consuming and incorrect, making it difficult for even haematologists to properly categorise 
leukaemia cells. Therefore, the correct treatment for the patient is achieved through early diagnosis of 
leukaemia. Because it doesn't require any specialised lab equipment, image-based detection is a quick 
and low-cost option. Using processing tools like MATLAB, we have zeroed in on how variations in 
cell geometry—including area, perimeter, and statistical factors like mean and standard deviation— 
distinguish white blood cells from other blood components [8-11]. Once its statistical features are 
understood, different forms of leukaemia can be classified according to their unique morphologies. In 
this study, we employed the Naive Bayes (NB) and Support Vector Machine (SVM) algorithms to 
improve upon preexisting methods [12-16]. 


Due to its potential to eliminate the requirement for huge labelled datasets in order to successfully train 
deep models based on artificial neural networks, semi-supervised learning has recently gained 
considerable attention [17-21]. The process of acquiring labelled data can be time-consuming, 


expensive, and/or technical in nature [22]. To label or segment vast amounts of medical imaging data 
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reliably, for example, needs a lot of time and effort from a team of highly skilled radiologists or 
technologists. During the recent severe infant brain MRI segmentation challenge (iSeg2017), for 
instance, neuroradiologists spent an average of one week manually segmenting each brain MRI scan 
[23]. However, in many fields, particularly medical imaging, access to massive unlabeled data is both 
simple and cheap [24]. When compared to other machine learning techniques, deep learning 
approaches’ ability to model complicated, high-dimensional datasets by means of feature 
representations is a major strength [25]. 


Since then, neural networks have proven to be highly effective at drawing conclusions from high- 
dimensional picture data, leading to state-of-the-art results in computer vision [26-29]. However, most 
deep learning algorithms are supervised, meaning that they learn to make predictions or classify data 
using labelled training examples [30]. In order to accommodate semi-supervised or unsupervised 
learning tasks, these algorithms have been changed in many ways, some of which are discussed in the 
following section. Pre-training for supervised learning tasks is typically accomplished through 
unsupervised learning approaches [31]. Clustering methods, which fall under the umbrella of 
unsupervised learning, are used to organise unlabeled data into groups with shared characteristics. 
However, these methods cannot take into account preexisting labels for categories [32]. However, the 
"curse of dimensionality" typically hinders the performance of clustering algorithms when working 
with high-dimensional data, even while they do well when working with low-dimensional data [33]. 


Clustering algorithms would need an abnormally large number of data points to accurately quantify the 
effect of factors on data in order to draw conclusions when the distance between data samples 
increases in high dimensional environments [34-39]. One of the most common baseline datasets in 
computer vision is called MNIST, and the infamous K-Means clustering technique only manages a 
score of around 55% on it. In this research, we employ the recently established deep embedded 
clustering (DEC) algorithm to craft a semi-supervised deep learning approach that is both accurate and 
adaptable, while also being computationally inexpensive. Iteratively optimising a cost function based 
on target probability distributions, DEC uses a deep stacked autoencoder in conjunction with a 
clustering technique to fine-tune cluster centeredness [40]. To this end, we describe a new training 
strategy for a semi-supervised approach that learns feature representations from unlabeled data while 
maintaining the model's consistency with the labelled data by incorporating a clustering layer into a 
deep convolutional neural network (CNN) [41-43]. 


This technique, which we call Semi-Supervised Learning with Deep Embedded Clustering (SSLDEC), 
is applied to both standard image classification datasets used to evaluate and compare semi-supervised 
learning algorithms and a difficult medical image segmentation task, namely, the is intense infant brain 
MRI segmentation based on the iSeg2017 challenge. When only a subset of the data is labelled, our 
suggested method achieved competitive results in experiments on MNIST, SVHN, and iSeg2017, all 
of which are semi-supervised learning tasks [44-49]. 


Types 

Both analogue and digital approaches are employed in the field of Image Processing. Images on paper 
or in images can be processed visually or analogly [50]. When analysing images, image analysts apply 
a number of different interpretation principles [51]. Image processing is not merely a field of study, 
but also an analyst's expertise [52]. Visual association is a powerful approach used in image 
processing. Therefore, analysts utilise both direct experience and external information when 
processing images [53-59]. As the raw data from imaging sensors on board satellite platforms has 
flaws, digital processing techniques help change digital images by employing computers [60]. In order 
to eliminate such problems and obtain authentic data, it must go through several stages of processing 


[61-65]. Pre-processing, augmentation, presentation, and information extraction are the three 
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overarching steps necessary for all digital data kinds [66]. The hierarchy of image processing is shown 
in Fig. 1. 


Having to save an image so that it may be moved to a disc or opened in a different application is a 
common occurrence [67]. In this situation, rather than reading an image from a file, you should save it 
to a file. The write command in MATLAB is what you need to get the job done [68-71]. It is possible 
to save an image in any format that can be read by MATLAB with this function [72]. 


5) C:\MATLAB6p5 \work\tutorial.m 3 _-(0 x) 
File Edit Yiew Text Debug Breakpoints Web Window Help 
Dae m| se Moo| Slams | OH) BSH ID | sex face ¥ x| 
1 % Tutorial M-file - 
2 % Created by: Someone 
3 % Created on: 9/11/03 
4 % Last revised: 12/5/03 
6 
6 %load a color .bmp image and convert to grayscale 
|= A = 'C:\MATLAB6p5\work\splash2.bmp'; %designate matrix A as the specified file 
8}- B = imread(A, 'bmp'); ‘matrix B loaded with bitmap file specified by A 
9) - figure(1), imshow(B); sshow image B in figure window 1 
10)-| C = rgb2gray(B); convert color image to grayscale image 
W1|-| figure(2), imshow(C); show image C in figure window 2 
12) %save the image as a .jpg image 
| imwrite(C,'C:\MATLAB6p5\work\splash2new.jpg','jpg') *save grayscale image C as jpeg file 
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Figure 1: M-file for Saving an Image 


Histograms are bar charts that demonstrate how data is distributed. Histograms reveal the distribution 
of pixel values in image processing [73]. If you want to know which values in a given image's pixels 
are crucial, a histogram is a great tool to employ. This information will allow you to adjust the 
animation to your liking [74-79]. Histogram information can be used for contrast enhancement and 
thresholding. The imhist function can be used to generate a histogram directly from an image. The 
histeq function can be used to increase contrast, while the grey thresh and im2bw functions can be 
used to set thresholds. Figure | displays an optimist, imadjust, grey thresh, and im2bw. The histogram 
of a contrast-enhanced image can be viewed by applying the imhist operation to the image produced 
by histeq [80]. 


Negative 

A negative image is the inverse of its source image. All Os in an 8-bit image become 255s, and all 255s 
become Os. Between these two extremes, all pixel values are similarly inverted. The new image 
appears as the reverse of the original [81-85]. This is done through the imadjust function. Using a 
complement, as shown in Figure 2, is another way to generate a negative image [86]. 


®) C:\MATLAB6p5\work\tutorialm -(0) x! 
File Edit View Text Debug Breakpoints Web Window Help 


DOeh|smemeo-|Slasr,| OH ABH WB| suafee  y] Fa] 
1 % Tutorial N-file B 
2 % Created by: Someone 
3 % Created on: 9/11/03 
4) % Last revised: 12/8/03 
5] 
6)-| A = 'C:\MATLAB6pS\work\splash2new.jpg'; | *designate matrix A as the specified file 
7|-| B = im2double(imread(a, '3pg')); sread and convert loaded image to class double 
8|-) figure(1), imhist(B,256); ‘display histogram in figure window 1 
Q)-] D = imadjust(B,[0 1],[1 0]); *create negative of original image 
10\-| figure(2), imshow(D); display negative in figure window 2 

11)-| E = histeq(B); enhance contrast to equally spaced bins 

12)-| figure(3), imshow(E); ‘display contrast enhancement in figure window 3 

43)-) F = graythresh(B); Scalculate threshold level 

14)-) G = imzbu(B,F); ‘convert grayscale image to binary according to threshold 

16/-| figure(4), imshow(G) Sdisplay binary image z 

4 » 
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Figure 2: M-file for Creating Histogram, Negative, Contrast Enhanced and Binary Images from the 
Image 


LEUKEMIA is the worst kind of blood cancer, affecting both young and old. Leukemia is a 
malignancy that starts in the blood cells and spreads to other parts of the body [87-91]. Without blood, 
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the body's metabolic processes would stall. Just like a cell divides and divides to produce more of 
itself, so does the human body [92]. In order for new cells to form, the old ones must die. As a result of 
old cells not dying and floating around in the blood, new cells are unable to flourish in cancer [93]. As 
a result, the blood's normal flow is disrupted, and the creation of white blood cells becomes erratic. 
Structures and Functions of Human Blood and Cells The stem cells in the bone marrow differentiate 
into many types of blood cells. This is a list of the components that make up blood. Erythrocytes, or 
red blood cells, transport oxygen to tissues and remove carbon dioxide from the blood [94-99]. The 
function of white blood cells (also called leukocytes) is to eliminate pathogens and protect the body 
from illness. Some examples of white blood cells are lymphocytes, monocytes, eosinophils, basophils, 
and neutrophils. There are various forms of leukaemia that correspond to distinct subsets of white 
blood cells. Platelets in the blood assist in stopping bleeding [100]. 


Types of Leukemia 


Children 1-12 years old and adults over the age of 40 have the highest risk of developing acute 
lymphocytic leukaemia (ALL). Here, white blood cell (WBC) lymphocytic cells are impacted. All also 
known as Acute Lymphoblastic Leukemia [101-105]. ALL are more prevalent in males than females. 
Young children as young as one year old and elderly patients alike might develop Acute Myeloid 
Leukemia (AML) [106]. Acute myeloid leukaemia is characterised by a spleen that is abnormally big 
and bone discomfort. Myeloid stem cells are compromised in this disease. Constant Leukemia In the 
early phases, the human body displays no signs [107]. Therefore, aberrant cells do not interfere with 
normal cell function in their early stages [108]. The disease moves slowly, affecting a wide range of 
blood cells, and manifests as until late on. The terminal phase is an incurable one. A. Senior citizens 
with age-related disorders are at increased risk for developing Chronic Lymphocytic Leukemia (CLL). 
The lymphocytes are compromised. In its early stages, it shows no symptoms [109]. 


LEUKEMIA is the worst kind of blood cancer, affecting both young and old. Leukemia is a form of 
cancer that starts in blood cells and spreads to other organs and tissues [110]. Without blood, the 
body's metabolic processes would stall. Just like a cell divides and divides to produce more of itself, so 
does the human body [111]. In order for new cells to form, the old ones must die. As a result of old 
cells not dying and floating around in the blood, new cells are unable to flourish in cancer [112]. As a 
result, the blood's normal flow is disrupted, and the creation of white blood cells becomes erratic [113- 
117]. 


System Study Feasibility Study 


In this stage, we assess the project's viability and present a business proposal outlining the broad 
strokes of the project's outline and some preliminary cost estimates [118]. The feasibility assessment 
of the proposed system is to be conducted during system analysis [119]. Because of this, the proposed 
approach won't end up costing the business any extra money [120]. Learning the system's primary 
needs is crucial for doing a feasibility study. The feasibility analysis probes the issue at hand and the 
stakeholders' information requirements [121-125]. The goal is to calculate how much time, money, 
and effort will be needed to implement a solution for information systems, as well as whether or not 
it's even possible to do so [126]. The research analyst may use any number of techniques, but the 
following are among the most common: 


Creating and sending out surveys to stakeholders, like potential users of the system, who have an 
interest in the information's success [127]. 


Keeping an eye on present system users to ascertain their requirements, levels of contentment, and 
areas for improvement [128]. 
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Gathering, reviewing, and evaluating all existing system documentation, including but not limited to 
reports, diagrams, procedures, manuals, and other written materials [129]. 


Recreating the system's present workflow through modelling, monitoring, and simulation. 
Existing System Naive Bayes (NB): 


Naive Bayes classifiers are a type of "probabilistic classifier" used in statistics. They are built on 
Bayes' theorem and make strong (naive) independence assumptions between the features. In 
combination with kernel density estimation, these models, which are otherwise among the simplest 
Bayesian networks, can reach superior precision [130]. The number of variables (features/predictors) 
in a learning issue need only increase linearly in the number of parameters used by a Nave Bayes 
classifier. When compared to the costly iterative approximation utilised by many other classifiers, the 
linear time required to train a maximum-likelihood model is a major advantage. Naive Bayes models 
are sometimes referred to as simple Bayes and independent Bayes in the statistics and computer 
science field. All of these terms allude to the fact that the classifier makes use of Bayes’ theorem in its 
decision process, yet naive Bayes is not (necessarily) a Bayesian technique [131-133]. 


Naive Bayes is a straightforward approach to developing classifiers, which are models that ascribe 
class labels to instances of a problem represented as vectors of feature values. The categories are 
assigned labels from a finite pool [134]. To train such classifiers, we need a family of algorithms that 
share a similar premise, rather than a single algorithm. The basic premise of all naive Bayes classifiers 
is that, given the class variable, the value of a feature is unrelated to the value of any other feature 
[135]. A crimson, spherical fruit that is roughly 10 cm in diameter may be identified as an apple. 
Without taking into account any possible relationships between the characteristics of colour, 
roundness, and diameter, a naive Bayes classifier will assign equal weight to each feature in 
determining the likelihood that the fruit in question is an apple [136]. 


A naive Bayes classifier can be taught relatively efficiently in a supervised learning context for some 
probability models [137]. It is possible to work with the naive Bayes model without embracing 
Bayesian probability or utilising any Bayesian methods, as parameter estimation for naive Bayes 
models often employs the maximum likelihood approach. Even though they aren't very sophisticated, 
naive Bayes classifiers have proven effective in many real-world scenarios that would challenge more 
sophisticated models [138]. In 2004, researchers looked at the Bayesian classification issue and found 
convincing theoretical explanations for the seemingly ineffectiveness of naive Bayes classifiers. Bayes 
classification is inferior to other methods, such as boosted trees and random forests, according to a 
comprehensive evaluation made in 2006. As a classification method, naive Bayes has the benefit of 
requiring only a minimal amount of training data in order to estimate the relevant parameters [139]. 
The SVM is focused on finding the optimal hyperplane for each unique case in a high-dimensional 
space. As a result of this model, we know that there is more than one hyperplane. The bolster vector, 
which is the data closest to the closed surface and has coordinates with the best choice surface, is 
essential to this procedure. It accomplishes classification by projecting input vectors into a high- 
dimensional space and building the hyperplane to partition the data. The most common applications of 
this method are for solving non-convex, unconstrained minimization problems and quadratic 
programming problems. When it comes to classifier processes, SVM is superior. 


Data Dictionary 


The database would be incomplete without a data dictionary. Metadata is the data about data, 

including information about the database itself. Actual adbms database descriptions can be found in 

the data dictionary. The data dictionary is typically an operational part of most DBMSs. Every time a 
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database is accessed, the database management system first consults its data dictionary. Since a 
database is designed to be produced and used by a number of people, it can be difficult to ensure that 
everyone understands what information is allowed in each field. Therefore, a data dictionary is a 
useful accessory for ensuring data uniformity. A data dictionary cannot be made in a uniform fashion 
at the present time. Tables have different metadata. An easily searchable data dictionary is the only 
need for its use. 


System Implementation 


The building blocks of a system are created during the implementation phase (system breakdown 
structure). Components of the system can be created, acquired, or repurposed. Forming, removing, 
connecting, and polishing are all examples of hardware fabrication processes; creating and testing 
software are examples of software realisation processes; and developing operational procedures for 
operator roles are examples of operational fabrication processes. If there will be a production phase in 
the implementation, it is recommended to employ a manufacturing system that follows the standard 
technical and managerial procedures. 


The purpose of the implementation process is to design, develop, and produce (or fabricate) a system 
component that meets the specifications and/or needs of the component's design. This component is 
built using standard industrial techniques. The integration process is the link between the system 
definition processes. System The implementation phase of a project is when the plan drawn up on 
paper is put into action. The most important part is finishing the system successfully and convincing 
the user that the new system will perform well. Using the previous system took a lot of time. Matlab 
was used in the creation of the suggested system. The current system resulted in a lengthy 
communication delay. The developed system, however, now has a superb user-friendly instrument, a 
menu-based interface, and a graphical interface. The project is to be deployed on the required system 
when coding and testing have been completed. The goal is to generate an executable file and then run 
it. The code is re-tested in the production environment. 


System Testing 


Thlt is via testing that flaws are uncovered and fixed. To find every flaw and vulnerability in a product 
through testing. It allows for the checking of the functionality of individual parts, assemblies, and/or 
the final product as a whole. The goal of software testing is to find any potential problems in a 
programme before they become major issues for the end user. Several distinct kinds of examinations 
exist. Particular testing needs are met by various test varieties (figures 3 to 5). 
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Figure 3: Pre-processing Figure 4: Segmentation 


Published under an exclusive license by open access journals under Volume: 3 Issue: 1 in Jan-2023 
Copyright (c) 2023 Author (s). This is an open-access article distributed under the terms of Creative Commons Attribution 
License (CC BY).To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/ 


24 


a a a a a a 
| J | AET International Journal of Innovative 
Analyses and Emerging Technology 


| e-ISSN: 2792-4025 | http://openaccessjournals.eu | Volume: 3 Issue: 1 


Abnormal - Benign Stage 


Figure 5: Classification 


Conclusion 


In this research, we model WBC cancer as a classification task and detail how we used a Neural 
network to determine whether or not the disease was malignant. Both Neural networks' outputs were 
compared to the current system's accuracy. In terms of accuracy and precision, it was found that the 
neural network technique used for classification in this study is more effective than previously used 
techniques. Results show that the SVM algorithm outperforms other methods currently used to detect 
WEC cancer. Because there may be a large number of RBCs, fluid, and WBCs in a given cell, and 
because the percentage of RBCs and fluid in a complete cell would be more than the percentage of 
WECs in a picture, the suggested method yields results that determine the minimal number of pixels 
identified. Therefore, the area affected by cancer would be the smallest possible subset of the whole 
area covered by coloured pixels. The proposed approach has the potential to be very useful for early 
detection of blood cancer. This method can be used in a broad sense for early cancer detection, when 
the number of malignant cells is expected to be at its lowest. The following suggestions can be used 
going forward: In some instances, where images are clear and cells are large enough to be captured by 
human eyes, the number of cells counted manually is identical to the automated count. When white 
blood cells (WBCs) are not clearly delineated in size, the quantity of cells may not be consistent 
between scans. It is impossible to see them and pull them out. Therefore, the WBC count found using 
color-based clustering is the correct one. 
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